Development of risk models of incident hypertension using machine learning on the HUNT study data

In this study, we aimed to create an 11-year hypertension risk prediction model using data from the Trøndelag Health (HUNT) Study in Norway, involving 17 852 individuals (20–85 years; 38% male; 24% incidence rate) with blood pressure (BP) below the hypertension threshold at baseline (1995–1997). We assessed 18 clinical, behavioral, and socioeconomic features, employing machine learning models such as eXtreme Gradient Boosting (XGBoost), Elastic regression, K-Nearest Neighbor, Support Vector Machines (SVM) and Random Forest. For comparison, we used logistic regression and a decision rule as reference models and validated six external models, with focus on the Framingham risk model. The top-performing models consistently included XGBoost, Elastic regression and SVM. These models efficiently identified hypertension risk, even among individuals with optimal baseline BP (< 120/80 mmHg), although improvement over reference models was modest. The recalibrated Framingham risk model outperformed the reference models, approaching the best-performing ML models. Important features included age, systolic and diastolic BP, body mass index, height, and family history of hypertension. In conclusion, our study demonstrated that linear effects sufficed for a well-performing model. The best models efficiently predicted hypertension risk, even among those with optimal or normal baseline BP, using few features. The recalibrated Framingham risk model proved effective in our cohort.


Background and objectives 3a
Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models.
Specify the objectives, including whether the study describes the development or validation of the model or both.Methods

Source of data 4a
Describe the study design or source of data (e.g., randomized trial, cohort, or registry data), separately for the development and validation data sets, if applicable.
4b Specify the key study dates, including start of accrual; end of accrual; and, if applicable, end of follow-up.

Participants 5a
Specify key elements of the study setting (e.g., primary care, secondary care, general population) including number and location of centres.5b Describe eligibility criteria for participants.5c Give details of treatments received, if relevant.

Outcome 6a
Clearly define the outcome that is predicted by the prediction model, including how and when assessed.6b Report any actions to blind assessment of the outcome to be predicted.

Predictors 7a
Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured.
Report any actions to blind assessment of predictors for the outcome and other predictors.Sample size 8 Explain how the study size was arrived at.

Missing data 9
Describe how missing data were handled (e.g., complete-case analysis, single imputation, multiple imputation) with details of any imputation method.

Statistical analysis methods 10a
Describe how predictors were handled in the analyses.
10b Specify type of model, all model-building procedures (including any predictor selection), and method for internal validation.
10d Specify all measures used to assess model performance and, if relevant, to compare multiple models.Risk groups 11 Provide details on how risk groups were created, if done.

Participants 13a
Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time.A diagram may be helpful.13b Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and outcome.

Model development 14a
Specify the number of participants and outcome events in each analysis. 14b If done, report the unadjusted association between each candidate predictor and outcome.

Model specification 15a
Present the full prediction model to allow predictions for individuals (i.e., all regression coefficients, and model intercept or baseline survival at a given time point).15b Explain how to the use the prediction model.Model performance 16 Report performance measures (with CIs) for the prediction model.

Limitations 18
Discuss any limitations of the study (such as nonrepresentative sample, few events per predictor, missing data).

Interpretation 19b
Give an overall interpretation of the results, considering objectives, limitations, and results from similar studies, and other relevant evidence.
Implications 20 Discuss the potential clinical use of the model and implications for future research.

Other information
Supplementary information 21 Provide information about the availability of supplementary resources, such as study protocol, Web calculator, and data sets.Funding 22 Give the source of funding and the role of the funders for the present study.
We recommend using the TRIPOD Checklist in conjunction with the TRIPOD Explanation and Elaboration document.

Supplementary Note
To create risk predictions using the external Framingham risk model and its refitted versions, two or three adaptations had to be made to the models due to mismatch in feature-definitions.Instead of the number of parents with hypertension used in the Framingham, CAVAS, and F-CAVAS risk models, we used family history of hypertension.We accommodated this by multiplying the model coefficient with the average number of parents with hypertension, if any, for the three models.This was 1.6394 for the Framingham risk model, and 1.108 for the CAVAS and F-CAVAS models.
The feature parental hypertension in TLGS and KoGES models was simply replaced with family history of hypertension.Further, 'Formerly daily smoker' and 'Never' in the HUNT Study was encoded as 'No', and 'Daily smoker' encoded as 'Yes' for feature 'Current smoker' for all models.Lastly, to accommodate the risk horizon of 11 years in this study, we calculated the risk of an event within 11 years, i.e., using t=11 in the Weibull equations.All external model equations used are shown in Supplementary Table S3.
For recalibration of the Framingham risk model to the HUNT Study data, we used the linear predictor of the adapted Framingham risk model as a single predictor in a logistic regression model.We fixed the slope to 1 and only estimated an intercept, i.e., an offset, from the HUNT Study data, following Method 1 described by Moons et al. 1 .See Supplementary Table S3 for the exact risk equation used and adaptations made.
Supplementary Table S3.External risk models with adaptations and the recalibration of the Framingham risk model to the HUNT Study data.To accommodate the risk horizon of 11 years, t=11 is used in the Weibull regression for all models.For the feature 'Smoking', the levels 'Formerly daily smoker' and 'Never' in the HUNT Study were both encoded as 'No', and 'Daily smoker' encoded as 'Yes'.For features 'Female', 'Smoking' and 'Fam.hist. of hyp.', levels 'Yes' and 'No' were set as 1 or 0, respectively, when used in model equations.Remaining features were set as their numeric value.* The feature 'Number of parents with hypertension' in the models was replaced with 'Family history of hypertension' and its coefficient multiplied by the average number of parents with hypertension in the respective development cohorts, i.e., 1.6394 in the Framingham model, and 1.108 in the KoGES models.'BMI': Body Mass Index, 'BP': Blood pressure, 'Fam.hist. of hyp.':Family history of hypertension.

Supplementary Discussion
The overestimation of risk made by the Framingham risk model might be due to differences in the study and cohorts baseline characteristics, shown in Supplementary Table S4.The notable aspects are that the incidence rate, systolic blood pressure and smoking were highly different between the two cohorts.However, in the Framingham Offspring Study, an individual could supply multiple records in the data set due to multiple follow ups, until they either presented with hypertension at an assessment or being censured at the end of study.This means that while 1717 individuals were included in the study, a total of 5814 records were used.With 796 outcomes, only 13.7% of the included records had hypertension as the outcome in contrast to an incidence rate per individual of 45%.For systolic blood pressure and smoking, the inclusion of multiple follow ups means that the baseline data characteristics might not be representative of the 5814 records that were used, e.g., individuals may have stopped smoking or experienced changes to their systolic BP between assessments.
Parental, or familial, history of hypertension was different between the studies, which may have to do with differences in recording of that variable.In the Framingham Offspring Study, the parent's history of hypertension was recorded as part of another study, meaning the information is more precise than that recorded in the HUNT Study, which relied on participants reporting their parents' hypertension history via questionnaires.However, parental/familial history of hypertension was associated with increased risk in the Framingham risk model so, in isolation, we would expect lower risk estimates for the HUNT Study data as it had a lower rate.This was not the case, as the Framingham risk model overestimated risk for the HUNT Study data.Lastly, we note that there was a higher proportion of women included in the HUNT Study.In short, based on the study and cohort characteristics, it is not clear why the Framingham risk model overestimated risk for the HUNT Study cohort.Supplementary Table S4

Supplementary Method
The original dataset made available from the HUNT Study had 5 087 (22%) records with missing entries in their feature data.However, missing entries were low for most features, see Supplementary Table S9.Only 'family history of hypertension' exceeded 10%, with 'family history of CVD', 'socio-economic status' and 'physical activity' being the only others exceeding 1%.In summary, 4 441 individuals missed one entry, 568 missed two entries, 68 missed three, nine missed four, and one missed five.In testing for differences between the datasets with and without those that had missing feature entries, we found significant differences on means or proportions in 'age', 'sex', and 'socioeconomic status'.However, among these, all differences were small, and only 'socioeconomic status' had any missing values with 1.4% missing entries.
Ideally, the missing feature entries would have been imputed using multiple imputation 7 .Instead, we removed them from the dataset for our main analysis and used only individuals that had complete data available.This was motivated by reducing the time and computational burden imposed by the multiple imputation procedure.As a sensitivity analysis, we used Multiple Imputation by Chained Equations (MICE) to impute our data and develop risk models.To compare changes in performance, we calculated performance measures four ways: Models fitted with and without imputed data were evaluated on the test set with and without individuals with imputed entries.
We reduced the hyperparameter search space and chose a subset of methods that we empirically found to be faster to fit.Using the MICE procedure on the original dataset with missing feature entries, we performed model development using the subset of methods and hyperparameters.We sampled individuals with missing data by a 7:3 ratio, adding them to the already defined training and testing set used in the main analysis.Model development was done by selecting hyperparameters with four-fold cross-validation.We applied MICE within the cross-validation routine to avoid data leakage, i.e., using only the training folds to learn the imputation parameters at each iteration.We generated 20 imputed versions of datasets and used 10 imputation iterations whenever MICE was applied.After model development, MICE was applied again to learn parameters from the full training set to impute the test set.Results from the four-way evaluations are shown in Supplementary Table S10, where the results from the main analysis are also included.
The pattern was reminiscent to results from the main analysis: Most models perform well, with ML models outperforming the logistic regression model and the high normal BP rule.The Framingham risk model was again worse than the ML models but better than the reference models on discrimination.However, recalibration was required to achieve acceptable calibration for the Framingham risk model.Supplementary Table S5.Feature distributions of the complete data set used in the main analysis, stratified by outcome status.

Feature
TRIPOD Checklist: Prediction Model Development Supplementary Figure S1.Flow of data used in analyses.Individuals in the HUNT Study data supplied one record each.The "7 : 3" refers to splits of the data done by random sampling in a 7:3 ratio.The selection criteria 'BP < …/…' refer to baseline blood pressure for individuals in the test set.Note, some records fulfilled multiple exclusion criteria.In addition, some individuals have withdrawn their consent to participate in the HUNT Study, hence the available number of records being fewer than those reported in earlier cohort profiles.'BP': Blood pressure, 'CVD': Cardiovascular disease.

method, method name a Hyperparameter a Value selected Grid Search strategy
. Hyperparameters selected for different methods in cross-validation.
Grids were derived from default options for random search used in the 'caret' package.Hyperparameter and model naming follow the conventions of the 'caret' package.Grid dimensions reported as sets of numbers, e.g., {a, b, c, d} or intervals, e.g., [x, y].Numbers were rounded to five decimals.a Name given in the caret software package.'exp': the exponential function, 'KNN': K-Nearest Neighbors, 'SVM': Support Vector Machines, 'XGBoost': eXtreme Gradient Boosting.

characteristics Framingham Offspring Study HUNT Study II
. Study and data characteristics for the Framingham Offspring Study cohort used to develop the Framingham risk model and the HUNT Study cohort used in this study.
Reported as mean (standard deviation) for numerical features, and count (percentage) for categorical features.P values were rounded to four decimals.* Significant after applying Holm's step-down correction with α = 0.05.a Two-tailed Welch t-test for a difference in mean value for numerical features or chi-square test for a difference in proportions in categorical features.Each individual feature was tested as the group of individuals having normal BP (<130/85 mmHg) versus the group of individuals having high normal BP (130/85 mmHg ≤ BP < 140/90 mmHg) at baseline.'BMI': Body Mass Index, 'CVD': Cardiovascular disease.